Pattern-Guided Data Anonymization and Clustering
نویسندگان
چکیده
A matrix M over a fixed alphabet is k-anonymous if every row in M has at least k − 1 identical copies in M . Making a matrix kanonymous by replacing a minimum number of entries with an additional ?-symbol (called “suppressing entries”) is known to be NP-hard. This task arises in the context of privacy-preserving publishing. We propose and analyze the computational complexity of an enhanced anonymization model where the user of the k-anonymized data may additionally “guide” the selection of the candidate matrix entries to be suppressed. The basic idea is to express this by means of “pattern vectors” which are part of the input. This can also be interpreted as a sort of clustering process. It is motivated by the observation that the “value” of matrix entries may significantly differ, and losing one (by suppression) may be more harmful than losing the other, which again may very much depend on the intended use of the anonymized data. We show that already very basic special cases of our new model lead to NP-hard problems while others allow for (fixed-parameter) tractability results.
منابع مشابه
Pattern-Guided k-Anonymity
We suggest a user-oriented approach to combinatorial data anonymization. A data matrix is called k-anonymous if every row appears at least k times—the goal of the NP-hard k-ANONYMITY problem then is to make a given matrix k-anonymous by suppressing (blanking out) as few entries as possible. Building on previous work and coping with corresponding deficiencies, we describe an enhanced k-anonymiza...
متن کاملUtility-guided Clustering-based Transaction Data Anonymization
Transaction data about individuals are increasingly collected to support a plethora of applications, spanning from marketing to biomedical studies. Publishing these data is required by many organizations, but may result in privacy breaches, if an attacker exploits potentially identifying information to link individuals to their records in the published data. Algorithms that prevent this threat ...
متن کاملSACK: Anonymization of Social Networks by Clustering of K-edge-connected Subgraphs
In this paper, a method for anonymization of social networks by clustering of k-edge-connected subgraphs (SACK) is presented. Previous anonymization algorithms do not consider distribution of nodes in social network graph according to their attributes. SACk tries to focus on this aspect that probability of existence of an edge between two nodes is related to their attributes and this leads to a...
متن کاملAn efficient privacy protection in mobility social network services with novel clustering-based anonymization
A popular means of social communication for online users has become a trend with rapid growth of social networks in the last few years. Facebook, Myspace, Twitter, LinkedIn, etc. have created huge amounts of data about interactions of social networks. Meanwhile, the trend is also true for offline scenarios with rapid growth of mobile devices such as smart phones, tablets, and laptops used for s...
متن کاملAnonymization Based on Nested Clustering for Privacy Preservation in Data Mining
Privacy Preservation in data mining protects the data from revealing unauthorized extraction of information. Data Anonymization techniques implement this by modifying the data, so that the original values cannot be acquired easily. Perturbation techniques are variedly used which will greatly affect the quality of data, since there is a trade-off between privacy preservation and information loss...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011